STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset

نویسندگان

  • Yuya Yoshikawa
  • Yutaro Shigeto
  • Akikazu Takeuchi
چکیده

In recent years, automatic generation of image descriptions (captions), that is, image captioning, has attracted a great deal of attention. In this paper, we particularly consider generating Japanese captions for images. Since most available caption datasets have been constructed for English language, there are few datasets for Japanese. To tackle this problem, we construct a large-scale Japanese image caption dataset based on images from MS-COCO, which is called STAIRCaptions. STAIRCaptions consists of 820,310 Japanese captions for 164,062 images. In the experiment, we show that a neural network trained using STAIR Captions can generate more natural and better Japanese captions, compared to those generated using English-Japanese machine translation after generating English captions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Large-Scale Database of Images and Captions for Automatic Face Naming

We present a large scale database of images and captions, designed for supporting research on how to use captioned images from the Web for training visual classifiers. It consists of more than 125,000 images of celebrities from different fields downloaded from the Web. Each image is associated to its original text caption, extracted from the html page the image comes from. We coin it FAN-Large,...

متن کامل

ISIA at the ImageCLEF 2017 Image Caption Task

This paper describes the details of our methods for participation in the caption prediction task of ImageCLEF 2017. The dataset we use is all provided by the organizers and doesn’t include any external resources. The key components of our framework include a deep model part, an SVM part and a caption retrieval part. In deep model part, we use an end to end architecture with Convolutional neural...

متن کامل

Names and Faces

We show that a large and realistic face dataset can be built from news photographs and their associated captions. Our dataset consists of 44,773 face images, obtained by applying a face finder to approximately half a million captioned news images. This dataset is more realistic than usual face recognition datasets, because it contains faces captured “in the wild” in a variety of configurations ...

متن کامل

Cross-Lingual Image Caption Generation

Automatically generating a natural language description of an image is a fundamental problem in artificial intelligence. This task involves both computer vision and natural language processing and is called “image caption generation.” Research on image caption generation has typically focused on taking in an image and generating a caption in English as existing image caption corpora are mostly ...

متن کامل

The effects of captioning texts and caption ordering on L2 listening comprehension and vocabulary learning

This study investigated the effects of captioned texts on second/foreign (L2) listening comprehension and vocabulary gains using a computer multimedia program. Additionally, it explored the caption ordering effect (i.e. captions displayed during the first or second listening), and the interaction of captioning order with the L2 proficiency level of language learners in listening comprehension a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017